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Abstract 

Hypercontractive inequalities are a useful tool in dealing with extremal questions in the 
geometry of high-dimensional discrete and continuous spaces. In this survey we trace a few con- 
nections between different manifestations of hypercontractivity, and also present some relatively 
recent applications of these techniques in computer science. 

1 Preliminaries and notation 

Fourier analysis on the hypercube. We define the inner product {f,g) = E^. /(x)g(x) on 
functions f,g: { — 1, 1}" — )• M, where the expectation is taken over the uniform (counting) measure 
on { — 1,1}'". The multilinear polynomials xsi^) = Yli^S'^i (where S ranges over subsets of [n]) 
form an orthogonal basis under this inner product; they are called the Fourier basis. Thus, for 
any function /: { — 1,1}" — )• M, we have / = J2sc[n] f('^)xsix), where the Fourier coefficients 
f{S) = {f,xs) obey Plancherel's relation ^^/(S)^ = 1. It is easy to verify that E,xf{x) = /(O) 
and Var^. f{x) = Y^s^d) fiS)"^- 

Norms. For 1 < p < oo, define the £p norm ll/llp = (E^|/(x)|P)i/P. These norms are monotone 
in p: for every function f,p>q implies ||/||p > For a linear operator M carrying func- 

tions /: { — 1, 1}" — 7- M to functions Mf = g: { — 1, 1}"" — )• M, we define the p-to-q operator norm 
||M||p_>.q = supj II Af /llg/ll/llp. M is said to be a contraction from ip to iq when ||M||p_j.g < 1. 
Because of the monotonicity of norms, a contraction from ip to ip is automatically a contraction 
from £p to ig for any q < p. When q > p and ||M||p_j.q < 1, then M is said to be hypercontractive. 

Convolution operators. Letting xy represent the coordinatewise product oi x,y £ {—1,1}", 
we define the convolution (/ * g){x) = Ky f{x)g{xy) of two functions f,g: {—1, 1}"" — >• M, and note 
that it is a linear operator f ^ f *g for every fixed g. Convolution is commutative and associative, 
and the Fourier coefficients of a convolution satisfy the useful property f * g = fg- We shall be 
particularly interested in the convolution properties of the following functions 

• The Dirac delta 6: { — 1, 1}" — )• M, given by (5(1, . . . , 1) = 1 and 5{x) = otherwise. It is the 
identity for convolution and has 5{S) = 1 for all S" C [n]. 
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The edge functions hi : { — 1, 1}"" — )• M given by 

hi{x) 



[ 1/2 x = (l,...,l) 

-1/2 Xj = -l,X[„]\|i} = (1, . . . , 1) 
otherwise. 



hi{S) is 1 or according as S contains or does not contain i, respectively. For any function 
/: { — 1, 1}" — M, (/ * hi){x) = (/(x) — f{y))/2, where y is obtained from x by flipping just 
the ith bit. Convolution with hi acts as an orthogonal projection (as we can easily see in the 
Fourier domain), so for any functions f,g: { — 1, 1}" — )• M, we have (/ * hi,g) = (/, hi * g) = 
if *hi,g* hi) 

• The Bonami-Gross-Beckner noise functions BGp: {—1,1}"" — )■ M for < p < 1, where 
BGp(S') = pl'^l and we define O'^ = 1. These operators form a semigroup, because BGo- * BGp = 
BG^p and BGi = 6. Note that BGp(x) = YlsP^^^Xs{x) = + pxi). We define the noise 
operator Tp acting on functions on the discrete cube by Tpf = BGp */. In combinatorial 
terms, {Tpf){x) is the expected value of /(y), where y is obtained from x by independently 
flipping each bit of x with probability 1 — p. 

Lemma 1. |^ BGp = i BGp * ^ h^ 

Proof. This is easy in the Fourier basis: 

^p = {p\'\y = \s\p\'\-' = Y.k^- □ 

iefnl ^ 



2 The Bonami-Gross-Beckner Inequality 

2.1 Poincare and Log-Sobolev inequalities 

The Poincare and logarithmic Sobolev inequalities both relate a function's global non-constantness 
to how fast it changes "locally". The amount of local change is quantified by the energy D{f,f), 
where the Dirichlet form B is defined as 

D(/,<7) = i E(f{x)-f{y)){g{x)-g{y)) 

xydE 

(E is the set of pairs x,y that differ in a single coordinate). In terms of the edge functions hi, 
observe that B{f,g) = ^Y^iif * hi, g * hi) . 

In the case of the Poincare inequality, we measure the distance of / to a constant by its variance 
Var(/) = E(/ — E/)^ = E/^ — (E/)^. Then the Poincare constant (of the discrete cube) is the 
supremal A such that the inequality 

lD)(/,/)> AVar(/) 

holds for all / : {—1, 1}" — s- M. This quantity is also the smallest nonzero eigenvalue of the Laplacian 
of the discrete cube, viewed as a graph (i.e., its spectral expansion). 
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Another way of measuring the non-constantness of a function is to consider its entropy Ent(/) = 
E[/log (where we assume / > and use the convention that OlogO = 0). Note that Ent(c/) = 
cEnt(/) for any c > 0, so the entropy is homogenous of degree 1 in its argument. Because we are 
comparing the entropy with the energy (which is homogenous of degree 2) we use the entropy of the 
square of the function to define the Log-Sobolev constant: the largest a such that the inequahty 

D(/,/) >aEnt(/2) 

holds for all /: {—1,1}" — )• M. For the discrete cube {—1,1}", we have A = 2/n and a = 1/n, 
as we shall see below. It is interesting to ask how these quantities are related when we consider 
other probability spaces equipped with a suitable Dirichlet form (for example, d-regular graphs 
with D(/, g) = Kxy£E{f{x) — f{y)){g{x) — g{y)), where the expectation is taken over all edges). Set 
/ = 1 + e(7 for a sufficiently small e and observe that Var(/) = Var(5') and D(/, /) = e'^'B{g,g), 
whereas 

Ent(/2) = E [(1 + €5)^(2 log(l + eg) - logE[(l + egf])] 
= 2e^Yar{g) + 0{e^) 

This shows that a < A/2, which is tight in the case of the cube. However, for constant-degree ex- 
pander families (in particular, for random d-regular graphs with high probability) we have [DSC96, 
Example 4.2] A = $7(1) but a = 0(loglogn/logn) ^ A. 

2.2 Hypercontractivity and the log-Sobolev inequality 

When p G [0, 1], the noise operator Tp is easily seen to contract i2- for any /: { — 1, 1}" — )• M, we 
have \\Tpf\\l = EsP'^'/l^)^ < Us = \\f\\l Now consider its behavior from £2 to ig for 

some q > 2. When p = 1, we have Tif = f; in particular, for g{x) = (1 -|- xi)/2, \\g\\q = 1/2^^^' > 
I/2V2 = 11^112. On the other hand. To/ = E/, so \\Tof\\g = |E/| < ||E/2||V2. By the intermediate 
value theorem, there must be some p G (0, 1) such that ||To||2->.5 = 1. A theorem of Gross [Gro75] 
connects this critical p with the Log-Sobolev constant a of the underlying space: 

Theorem 2. \\Tpf\\p^g < 1 if and only if p-^^"" > 

Stated differently, ||ri_,/||g < ||/||2 when q < (1 - e)-^ + l 2 + 2e. Thus to prove hyper con- 
tractive inequalities on the discrete cube, it suffices to bound the log-Sobolev constant. We shall 
prove this claim for p = 2, which turns out to imply the general version. 

Proof of Theorem 2. We shall prove that HTp/Hq < ||/||2 for g = 1 + p~'^°"'; the remainder of the 
theorem can be shown using similar techniques. As we observed before, this inequality is tight 
when p = 1, so it suffices to show that ^||Tp/||(j ^ for < p < 1. For notational convenience, let 
G = WTpfWl Then 

WTpft = {G'/'Y = g-^G^i/'^)-^ {qG' - q'GlogG) . 
Now we use the fact that G = KiTpf)'' to get 

G' = qE [{TpfyiTpf)'] +q'E[{Tpf)nog{Tpf)] . 
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Applying Lemma 3 and simplifying, we get 

2 

qC - g'GlogG = q'Ent {{T^ff) + {{T,fy-\Tpf) . 

zp 

We use Lemma 4 to handle the second term, and plug in g = 1 + 

qC - q'G\og G = np-2"-i [©((r,/)'?/^, {T.fy^') - Ent ((Tp/)")] , 

whose positivity we are guaranteed by the log-Sobolev inequality applied to (Tp/)^''^^^/^. □ 
Lemma 3. Foranyf,g: {-1,1}" ^R, {g^ d_{Tpf)) = l^B{g,Tpf). 
Proof. Recalling Lemma 1 and the projection property of the /ijS, we have 

{g, {T,f)') = {g, BG'p */) = (g,-BG,*f *^h,) = - ^,{9 * h^,BG, *f) = f Big, T,f). □ 

\ P i / P i 

Lemma 4. For any f : {-1, 1}" and q>2, D(/, /^-i) > D [fi/'^, /'?/2) . 

Proof. It suffices to show that (a""^ - h'i-^){a - b) > ^^^(a^/^ - 69/2)2 f^^. ah a > 6 > and 
q > 2. But observe that 

ffi-^dt r dt = ^-{a"-^ -b'^-^){a-b) 

Ja Ja Q ~ ^ 

and the inequality between the integrals follows from convexity. □ 
2.3 Two-point inequality 

We begin by showing that the log-Sobolev inequality holds for the uniform distribution on the 
two-point space {—1, 1} with a = 2. Without loss of generality, consider f{x) = 1 -|- sx. Then 

Ent(/2) = i(l + s)2 log(l + s)2 + 1(1 - s)2 log(l - s)2 - (1 + s^) log(l + s^) 

and D(/, /) = 2s^. We shall show that (j){s) = D(/, /) — aEnt(/2) is non-negative for -1 < s < 1. 
By symmetry it suffices to consider s > 0. But (j){0) = and 

(j)'{s) = 4s + 2s log(l + + 2(1 - s) log(l -s)- 2(1 + s) log(l + s), 

which is non- negative because (^'(0) = and 

4s2 1 + ^2 

<^"(«) = 4^ + 21og^>0. 
S'^ + 1 1 — 
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2.4 Tensoring property 



Theorem 5. Let a be the log-Soholev constant o/{ — 1,1}". Then the log-Sobolev constant of 
{-1,1}2" is a/2. 

When n is a power of 2, we can conclude inductively that a = 1/n; a proof along similar lines 
works for arbitrary n as well. 

Proof of Theorem 5. For any /: {-1,1}" x {-1,1}" ^ M, set g{x) = ||/(x,-)||2- Then by the 
conditional entropy formula, 

Ent(/2) < Ent(,2) +EEnt(/(x,,)2) < ^(9^ s) +^^W^,y)Ji^,y)) 
X y a 

and by convexity, 

0{g,g) = i E {g{x) - g{x')f < i E E - f{x\y)f] = EB,{f{x,y),f{x,y)) 

x~x' xr^x' y y 

where the notation x ~ rc' ranges over edges of {—1, 1}". Taken together, these give 

^^^.j2. < E. Byifjx, y),f{x, y)) + D,.(/(x, y), f{x, y)) ^ 2D(/) ^ D(/) 
~ a ~ a a/2 

as claimed. □ 



2.5 Non-product groups 

Recall that we defined the Dirichlet form 

D(/,5) = i E (f{n)-fiv)){g{u)-g{v)) 

for functions f,g: { — 1,1}" — t- M, but it makes sense for any regular graph if we sample u,v 
uniformly from the edges. Thus, given any family of regular graphs, we can ask if they satisfy a 
log-Sobolev inequality of the form D(/, /) > aEnt(/) for all suitable /. 

It turns out that the relationship between logarithmic Sobolev inequalities and hypercontractive 
noise operator subgroups, as stated by Gross [Gro75], holds for a wide class of spaces, not just the 
hypercube {—1,1}". Diaconis and Saloff-Coste [DSC96] explored an intermediate between these 
two extremes of specialization to give improved mixing time results for Markov chains on various 
graphs. 

One of the first discrete applications of hypercontractivity was a celebrated theorem of Kahn, 
Kalai and Linial [KKL88] relating the maximum influence of a function on the hypercube to its 
variance. In Theorem 7, we discuss some recent work [OW09b] of O'Donnell and Wimmer general- 
izing the KKL theorem to apply to the wider class of Schreier graphs associated with group actions 
(defined below). 

An action of a group G on a set X is a homomorphism from G to the group of bijections on X, 
and we write x^ for the image of x under the bijection for g. If S" is a set of generators for G, then 
the Schreier graph Sch(G, S, X) has vertex set X and edges (x, x^) for all x £ X and g & S. It is 
known that every connected regular graph of even degree can be obtained in this way [Gro77] . The 
definition of the Dirichlet form B generalizes without change, but to be able to derive a log-Sobolev 
inequality for this space, we must define the noise operator Tp in an appropriate fashion to satisfy 
the claim of Lemma 1: {g, ^{Tpf)) oc j;B{g,Tpf). 



5 



3 Boolean- Valued Functions 



3.1 Influences 

Write X-i for the collection of random variables {xi, . . . , Xn} \ {xi}. The influence of the ith 
coordinate on a function /: { — 1, 1}" — ?• M is given by 



Infi(/) = E Var/(x) = E 



Efixf-iEfix)f 



When / is Boolean- valued, this quantity is just the probability that changing Xi changes f{x). Writ- 
ing / in the Fourier basis, we have E^_^ E^^ f{xf = f{xf = J2s fiSf and E^^^{E^^ f{x)f = 
Yls^i /('^)^' s° t^aX Infj(/) = YliS^i fi^)"^ ~ ^(/ * ^i)"^ ■ III addition, we define the total influence 
Inf(/) = E.Inf.(/) = E5l'9|/('9)'- 

3.2 Structural results 

Boolean functions are natural combinatorial objects, but they were first studied from an analytical 
viewpoint in work on voting and social choice. In this setting, a function /: { — 1, 1}" — )• { — 1, 1} 
is viewed as a way to combine the preferences of n voters to yield the result of the election. 
This explains the notions of dictator or junta functions, which depend on only one or a few of their 
coordinates, respectively. In this context it is also natural to consider functions where no coordinate 
("voter") has a very large influence. Kahn, Kalai, and Linial [KKL88] first introduced the Fourier 
analysis of Boolean functions as a technique in computer science. Their theorem establishes that if 
a function is far from a constant (i.e., has variance at least a constant), then it must have a variable 
of influence 0(i^^). We state a strengthening of their original inequality due to Talagrand [Tal95]: 

Theorem 6 ([KKL88, Tal95]). For any f : {-1,1}" {-1,1}, 

^ log(l/Infi(/)) 

We can compare this to the Poincare inequality on the cube, which can be stated as 

J]lnf,(/)>f)(l).Var(/). 

i 

(In particular, there exists a variable of influence ^{^) Var(/).) The KKL theorem is a stronger 
result of the same form: it is a comparison between a local and a global measure of variation. The 
proofs of KKL and Talagrand used the hypercontractivity of the cube, but we present here a more 
recent proof due to Rossignol that uses the log-Sobolev inequality instead. For simplicity we'll just 
show the weaker statement that the maximum influence is fi(^) Var(/). 

Proof. Write / - E / = /i H h /n, where fj = J2s:ms.KS=j fiS)xs- For each fj, the log-Sobolev 

inequality states that lD){fj,fj) > aEnt(/?) = iEnt(/J). By writing D(/,-,/,) in terms of the 

Fourier coefficients f{S), we can check that B(/, /) = X^"=o^(/?'' /?')' ^^^^ these 
inequalities to obtain 

nB(/, f)>Y: Ent(/|) = E [/| log(/|)] +Y.^f!log^,. 

^ V ' V ' 

A B 
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In order to bound B, we begin by noting that 

5:max5=j SBj 

where the hjs are the edge functions we defined earher. Letting M(f) = maXjE(/ * hj)^ 
maxj Infj(/), we have 



B = y^E/hog^ > y E/^log^- = Var(/)log^- 
Jj ^^fj-Z^ ^ M{f) M{f ) 

where we have used the orthogonahty of the /jS and the fact that Var(/) = X]5-^0 f{S)'^. 
To bound A, we split it up further: 



^ = [/|log(/|) • V.<J + J]E [/|log(/|) • 1 

'j 



Ai A2 
For < t < 1/e^, we have that \/tlog \ft is a nonpositive decreasing function and therefore, 

Ai = 2^E [|/,-|log|/,| • |/,-|V2<J > 2Vtlog VtE^I/i • ^S]<t\ ^ VilogtJ2^\fj\. 

j j j 

By comparing Fourier coefficients, it is easy to verify that fj = Kxj+i,...,x„if * hj). Therefore, by 
convexity, E | | <E\f * hj\. 

Until now, the proof has made no use of the fact that / takes on only Boolean values. Now 
we argue that because /(x) G { — 1,1}, we must have {f * hj){x) G {—1,0,1}, so that E |/ * = 
E(/ * hj)'^. Plugging this into our bound for Ai yields 

Ai > VtlogtY^Hf * hjf = ^Vtlogt- D(/,/). 
j 

For A2, note that log(-) is increasing, so 

A2 > logt^E/2 = logtVar/. 

Summing all these bounds gives us 

n D(/, /) > log jj^^ Var(/) + ^ log t • ©(J, /) + log t • Var(/). 

By the Poincare inequality, B(/, /) > ;|Var(/), so we can set t = ( neD(/'/) ) ^ — 1/^^- With this 
substitution, the above inequality becomes 

2 , t^+y^ 
> log- 



eVi- "M(/) 
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Suppose t < {^?. Then 



en 4 \ n 

and we know that M{f) > 2D(/, /). On the other hand, if t > {-^^f, then 

/ — 2 \ / 4 \2+2/e / — loen\ logn 
M(/)>,»'/'exp(-^) = (— ) exp(^)»-^, □ 

We are now in a position to state the recent result of O'Donnell and Wimmer [OW091i] gener- 
ahzing the KKL theorem to Schreier graphs satisfying a certain technical property. 

Theorem 7 ([OW091:i]). Let G be a group acting on a set X , U ^ X he a union of conjugacy classes 
that generates G, and a be the log-Sobolev constant o/Sch(G, X, U). Then for any / : X — )• {—1, 1}, 

Ec7lnf„(/) >f](aVar(/)). 



log(l/max;7 Inf„(/)) 
In particular, there is some u G U such that InfM(/) > r2(a log ^) Var(/). 

For an Abelian group such as (the cube), every group element is in a conjugacy class by 
itself, so the extra condition on U is vacuous. Using a = ^{^) for the cube, we recover the 
original KKL theorem. O'Donnell et al. apply the generalized result to the non-Abelian group 
Sn of permutations on [n], generated by transpositions and acting on the family ('^^) of /s-subsets 
of [n]. By viewing these families as sets of n-bit strings, they recover a "rigidity" version of the 
Kruskal-Katona theorem that states (roughly) that if a subset of a layer of a cube has a small 
expansion to the layer above it, then it must be correlated to some dictator function. 



Coding theoretic interpretation. In the long code, an integer i G [n] is encoded as the dictator 
function (xi, . . . ,Xn) i— )• Xj. By using many more bits (2" rather than logn) of redundant storage, 
we hope to be able to recover from corruptions in the data. The theorem tells us that as long 
as the corrupted version of an encoding is far from a constant function, it can be decoded to 
a coordinate whose influence is r2(logn) times the average influence. Since every coordinate's 
influence is nonnegative, only O(logn) coordinates can have influence this large. Thus, we have 
a "small" set of candidate long codes to which we might decode the word. To complete this 
picture, we'd like to understand how far the word can be from functions that depend only on 
these coordinates; the following theorem of Friedgut, which we state without proof, furnishes this 
information. 

Theorem 8 ([Fri9S]). For every f : { — 1, l}" — )• { — 1,1} and < e < 1, there is a function 
g: { — 1,1}'^ —7- { — 1,1} depending on at most exp( ^"^^'^^^^ Inf(/)) variables such thatKlf — g\ < e. 



4 Gaussian isoperimetry and an algorithmic application 

Hypercontr active inequalities were first investigated in the context of Gaussian probability spaces, 
for their applications to quantum field theory. The following simple proof reduces the continuous 
Gaussian hypercontractive inequality to its discrete counterpart on the cube. 
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4.1 From the central limit theorem to Gaussian hypercontractivity 



Theorem 9 ([Gro75]). Let x E M 6e normally distributed, i.e., 



Pr[x G ^4] 



1 



exp 



X 



dx. 



Then for a smooth function f : 



with a = 1 and 



27r J A 

the random variable F = f[x) satisfies 
){F,F) > aEnt(F2) 



2 \ ax ax 



Proof. We shall approximate the Gaussian distribution by a weighted sum of Bernoulli variables. 
Let y G {—1, 1}'' be uniformly distributed, and set g{y) = ■ By the log-Sobolev inequality 

applied to / o g{y), we have O (/ o ^(y), / o g{y)) > Ent(/ o gijjY). By the central limit theorem, 
the right side converges to Ent(/(x)^) = Ent(F^) as /c — ?• oo, so it remains to show that the left 
side converges to D(F, F) as well. Let y\yi=e be the value obtained by replacing the ith coordinate 
of y with the value 6, and observe that g{y\y^=l) — g{y\y^=-l) = 2l\fk. Then, using the smoothness 
of /, we have 



l(/^.*(/o5))(y)l = ^l/o5(2/l,.=i)-/o5( 



\y^=-^l 



1 

71 



1 



so that 



'(/off(y),/°5(y)) = ;TiE 

2 y 



Y,{K^{fog)){yf 



-E[/'o5(y)2 + o(l)]. 
I y 



The second term vanishes as — ?• oo, and the first term converges to B(-F, F~) by the Central Limit 
Theorem. □ 

The tensoring property of log-Sobolev inequalities lets us extend this result to Gaussian dis- 
tributions over W^. We are also interested in the corresponding noise operator Sp, known as the 
Ornstein-Uhlenbeck operator, which is given by 

Spf{x)= E + 
2~A^(0,l)d 

Theorem 2 has an analog in this setting, which lets us conclude that every function /: M"^ — t- M 
satisfies HSp/H^ < ||/||p where g > p > 1 and > (p — — !)• 



4.2 Reverse hypercontractivity and isoperimetry 

In 1982, Borell showed a reversed inequality of a similar form when q < p < 1: 

Theorem 10 (Reverse hypercontractivity, [Bor82]). Fix q < p < I and p > such that > 
{p — l)/{q — 1). Then for any positive-valued function /: — )■ M^, we have \\Spf\\g > \\f\\p. 
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Note that the expressions || • ||p are not norms when p < 1; in particular, they are not convex. 
However, this theorem can be proved by means similar to our proof for the Gaussian log-Sobolev 
inequality: we start with a base result for the 2-point space, proceed by tensoring to the hypercube, 
and use the central limit theorem to cover Gaussian space. 

As an application of Borell's result, consider the following strong isoperimetry theorem for 
Gaussian space (due to Sherman). 

Theorem 11 (Gaussian isoperimetry, [Shc09]). Let u,u' S be independent Gaussian random 
variables. Then for any set ^ C R*^ and any r > 0, we have 



Pr 

u 



Fv[pu + p^)u' £ A]<T 



< 



-i-p 



Proof. When p{A) < there is nothing to prove. Otherwise, let / be the indicator function of 

A and observe that Pr^/ [pu + (1 — p^Y^'^u' G yl] = Spf{u). Therefore, for g = 1 — l/p < 0, we have 



Pr 



Pr[u' G A] < r 



< 



Pr[5p/(n) < r] 

u 

^ASpfiuY > r"] 

u 

^u{Spf{u)Y 

t1 



by an application of Markov's inequality. But E„(5'p/(n))'^ is just HtSp/Hg, and we know by Borell's 
theorem that llS'p/llq > for p = 1 — p. Thus 



Pr 



Pr['u' £ A]<T 



< 



p{A) 



g/p 



where we have used the facts that g < and p <1. 



A-p \ 1/P 



p{A) 



A-p 



< 



p{A) 



□ 



4.3 Fast graph partitioning and the constructive Big Core Theorem 

Problem and SDP rounding algorithm. In the c-balanced separator problem, we are given a 
graph G = {V,E) on n vertices and asked to find the smallest set of edges such that their removal 
disconnects the graph into pieces of size at most cn. The problem is NP-hard, and the best known 
approximation ratio^ is Q{-\/\og n). 

The first algorithm to achieve this bound was based on a semidefinite program that assigns a 
unit vector to each vertex and minimizes the total embedded squared length of the edges subject to 
the constraint that the vertices are spread out and that the squared distances between the points 
form a metric: 

minimize ^j^j \\xi — XjW"^ 
subject to||xj||2 = 1 \/i £V 

E»jlki-2;jf >c(l-c)n 

1 1 X j 1 1 2 ~t~ 1 1 1 1 2 — 11"^^ "^A; 1 1 ^1 ^ J ^ k G \^ 

^For technical reasons, it is actually a pseudo-approximation: the algorithm's output for c is compared to the 
optimal value for c' 7^ c. 
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To round this SDP, Arora, Rao and Vazirani [ARV09] pick a random direction u and project all 
the points along u. They then define sets A and B consisting of points x whose projections are 
sufficiently large, i.e., A = {x | (x, u) < —K} and similarly B = {x \ {x, u) > K}, where K is chosen 
to make A and B have size 0(n) with high probability. Next, they discard points a £ A,b £ B such 
that \\a — b\\ is much smaller than expected for a pair whose projections are > 2K apart. Finally, 
if the resulting pruned sets A' C A and B' C B are large enough, they show that greedily growing 
A yields a good cut. 

Matchings and cores. The key step in making this argument work is to ensure that not too 
many pairs (a, b) are removed in the pruning step. To bound the probability of this bad event, we 
consider the possibility that for a large fraction 5 = il(l) of directions u, there exists a matching of 
points Mu such that each pair (a, b) G M„ is short (i.e., \\a — b\\ < i = 0(l/\/log?i)) but stretched 
along u (i.e., \{a — b,u)\ > cr = ^^(1)). Such a set of points is called a {a,6,£)-core. The big core 
theorem (first proved with optimal parameters by Lee [Lcc05]) asserts that this situation can't 
arise: for a fixed a, 6, and £, we must have n ^ exp(cj^/£^ log^(l/5)), which is a contradiction for 
our chosen values of a, 6, i. 

In order to prove the big core theorem, Lee concatenates pairs that share a point and belong in 
matchings for nearby directions. The existence of a long chain of such concatenations is what leads 
to a contradiction: if we consider the endpoints a, 6 of a chain of length p, the projection \ {a — b,u)\ 
grows linearly in p whereas the distance ||a — 6|| grows only as ^yp (recall that the SDP constrained 
the squared distances to form a metric). 

Boosting. The matching chaining argument we have just presented in its simple form doesn't 
work, for the following reason. At each chaining step, the fraction of nearby directions available for 
our use reduces by roughly 1 — 6 (by a union bound) so that we are rapidly left with no direction to 
move in. To remedy this situation, we need to boost the fraction of usable directions at each step, 
say from 5/2 to 1 — 5/2, so that we can carry on chaining in spite of a 1 — 5 loss. Lee's proof uses the 
standard isoperimetric inequality for the sphere to show that this boosting can be performed with 
no change in i and a very small penalty in a. In other words, we take advantage of the fact that 
a very small dilation of a set of constant measure (i.e., the set of available directions) has measure 
close to 1. 

Faster algorithms. Lee's big core theorem is non-constructive in the sense that it only shows 
the existence of such a long chain of matched pairs in order to give a contradiction. While this 
form suffices to bound the approximation ratio of the ARV rounding scheme, other variants of 
their technique require a way to efficiently sample long chains, not just show their existence. 
Sherman constructs a distribution over directions that does not depend on the point set at all, yet 
is guaranteed to always have a non-trivial probability of producing long chains of stretched pairs. 
More precisely. 

Theorem 12 (Constructive big core [Shc09]). For any 1 < R < ©(-v/log n), there is P > 
0(i2^/logn) and an efficiently sampleable distribution n over the set of sequences of < P direction 
vectors (each in W^), such that: for any {a, 6,£)-core M, if the string of directions is sampled from 
11, the expected number of chains whose endpoints are > Pi apart is at least exp(— O(P^)n). 
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We sketch some of the ideas of the proof here. Sherman constructs two sequences of Gaussian 
directions ui, . . . ,up and wi, . . . , wp. Each Wi is an independent Gaussian vector, whereas each 
Ui for 2 > 1 is a Gaussian vector /9-correlated with Ui-i. Finahy, the distribution /i is given by 
randomly shuffling together the Uj and Wi, picking a uniformly random R between 1 and P, and 
returning the first R elements of the shuffled sequence. The correlated directions Ui correspond to 
the steps in which Lee's proof chained pairs from similar directions, whereas the independent Wi 
correspond to the region-growing steps necessary for boosting. By randomly interleaving these two 
types of moves, Sherman's sampling algorithm can be oblivious to the actual point set it is acting 
on. 



5 Complexity theoretic applications 

5.1 Dictatorship testing with perfect completeness 

Definitions. A function /: { — 1, 1}" — )• M is said to be (e, 5)-quasirandom if f{S) < e whenever 
\S\ < 1/(5. In order to show that a given problem is hard to approximate, we often need to design 
a test that 

• performs q queries on a black-box function /, 

• accepts every dictator function with probability > c (the completeness probability), and 

• accepts every (e, (5)-quasirandom function with probability < s (the soundness probability). 
A test is said to be adaptive if each query is allowed to depend on the result of the queries so far. 

While dictatorship tests for the c < 1 setting have been known for over a decade (first from 
the work of Hastad and more recently via the Unique Games Conjecture of Khot), there were no 
nontrivial bounds for c = 1 until some recent results of O'Donnell and Wu. Their analysis, which 
we show below, relies heavily on the hypercontractive inequality. 

Theorem 13 ([OW09a]). For every n > 0, there is a 3-query non-adaptive test that accepts 
every dictator function {xi, . . . ,Xn) i— ?• Xi with probability c = 1 but accepts any (5, (5/ log (1/5)) - 
quasirandom odd function f: { — 1, 1}" — )• [—1, 1] with probability < s = 5/8 -|- 0{^/5). 

The proof uses the following strengthening of the hypercontractive inequality for restricted 
parameter values. 

Lemma 14. // < p < 1, g > 1, and < A < 1 satisfy / < then for all f : {-1, 1}" 

M, \\T,f\U < ||T,/||i-^||/||i 

Proof. 

Il^p/llq = \\TpxTpi-xf\\l 

< \\T,i-.fg 
s 

= \\Tpf\\?~'\\f\\f □ 
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Proof of Theorem 13. Define the "not-two" predicate NTW: {-1, 1}^ {-1, 1} as follows: NTW(a, b, c) = 
1 if exactly two of a,b,c equal —1, and NTW(a,6, c) = —1 otherwise. Explicitly, 

a -1 -1 -1 -1 1 1 1 1 

6-1-1 1 1-1-1 1 1 

c -1 1-1 1-1 1-1 1 

NTW(a,6,c) -1 i 1 -1 1 -1 -1 -1 

Let 6 £ [0, 1] be a parameter to be fixed later. For i = 1, . . . , n, we pick bits Xi,yi, Zi G { — 1, 1} as 
follows: 

• with probability 1 — 5: we choose Xi^yi uniformly and independently, then set Zi = —Xiyf, 

• with probability 6: we choose Xj uniformly, then set yi = Zi = Xi. 

Note that for i / j, {xi,yi, zi) is independent of {xj,yj, Zj). We accept if NTW(/(x), /(y), fiz)) = —1. 
It is immediate from the construction of Xi,yi, Zi that NTW(xj, yj, Zi) = —1 for i = 1, . . . ,n. Therefore, 
if / is a dictator function, it follows that NTW(/(x), /(y), f{z)) must also equal —1. 

Soundness. It remains to analyze the test when / is pseudorandom. We begin by writing NTW 

in the Fourier basis: NTW = -ix0 - j{X{i} + X{2} + X{3}) - i(X{i,2} + X{2,3} + X{1,3}) + iX{i,2,3}- 
Therefore, by symmetry, 

E NTW(/(x),/(y),/(z)) = -i-fE/(x)-|E/(x)/(y) + | E f (x) f {y) f (z) . 

x,y,z X x,y x,y,z 

We shall systematically rewrite the right-hand side in terms of the Fourier coefficients of /. By 
our assumption that / is odd, we have f{S) = whenever S has even cardinality. Therefore 
IE/(x) = /(0) = O. Also, 

E fix)f{y) = V/(S)/(T) E xsix)xT{y). 

x,y ^— ' x,y 
S,T 

Consider a summand where S ^ T, and without loss of generality fix i G S \ T. It is easy to see 
that the contributions due to Xi = ±1 cancel each other. Thus, the only terms that remain are of 
the form S = T, i.e., 

/ \\s\ 

E f{x)f{y) = ^f{Sf E xs{x)xs{y) = Y.f(S)H ^ ^^y^j =E/(^)''^""' 
^ x,y ^ \xt,yi J ^ 

where we have used the fact that E{xiyi) = (1 — 5) • + 5 • 1 = 5. But f{S) is nonzero only for \S\ 
odd, and Yl,s fi^Y — so we can upper-bound the above sum by 5. 

Bounding the cubic term. We proceed similarly: 

IE f{x)f{y)f{z) = V f{S)f{T)f{U) E Xs{x)XT{y)xu{z). (1) 

x,y,z x,y,z 

Each of the expectations can be written as a product over coordinates i G [n] using the fact that 
individual coordinates of x,y, z are chosen independently. When i belongs to exactly one of S, T, U 
(say S), then it contributes a factor Ercj = 0, making the product zero. Similarly, when i belongs 
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to two of the sets (say S,T), then the contribution is Ercjyj = 5 hy our earher calculation. Finally, 
when i belongs to all three of the sets, we have ExjyjZj = (1 — §) ■ (—1) + 5 • (0) = —(1 — 6). In 
light of this calculation, any triple S,T,U that makes a nonzero contribution to the sum (1) must 
be of the form 

S=AUBUC T=AUCUD U=AUDUB 

for suitable sets A,B,C,D C [n] where A is disjoint from B,C,D. Also \T\, \U\ must be odd, 
from which we can show that \A\ must be odd. In terms of these new sets we can rewrite 

E f{x)f{y)f{z) = - V f{AuBU C)f{A UCU D)f{A UDUB){1- 

x,y,z ^— ' 

B,C,D disj. from A 
\A\ odd 

For a fixed A, define the function qa : {-1, 1}^^^ M by = f{A n X). Then we have 

E f{x)f{y)f{z) 

x,y,z 

= - E (1 - ^)'^' E 9AiB U OVS^"""^^ . UC U i?) V^"^"''' • gA{D U i?)^^"'"^' 

|A| odd B,C,D 

disj. from A 

= _^(l_5)l^l f^A{BUC)-T^A{CUD).r;^A{DUB) 

\A\ odd B,C,D 

disj. from A 

= - Yl (1- 5)1^1 

|A| odd 

Write gA{u) ='^x gA{u) + gA{u) = /(^)+5a('u)- Then, using the inequality |o + 6p < 4(|ap + |6p), 
we have 

\\T^s9A^l = \\f{A) + T^s~9a\\I < Mf(.A)f + M\T^s9a\\1 

and therefore, 

E(l - ^)^''^\\Tvs9Af < 4E(1 - Sr\\f{A)f+4Yil - Sr^\\T^~9A\\l 

To bound the first term, note that ^^(l - -5)'^' < E /(^)^ •max{(l - 5)1^1 |/(A)|)}. The sum 
of the squared Fourier coefficients is just 1 (by Parseval's identity) and we can use the {6, i^g^i/s) )" 
pseudorandomness property to bound the quantity in the maximum: when \A\ < ^log^, then 
1/(^)1 < and when |j4| > ^ log | then (1 — 6)^^^ < 6. Thus the entire first summand is 0{^/6). 

Hypercontractivity. It remains to bound ^(1 — 5)I"^I||T ^^aHI- Fix A = iog°i/g) and apply the 
modified hypercontractive inequality: 

E(i - ^)'^'rv^^Aiii < E(i - ^)'^' \\T^s~9A\t'r9A\\i' 

Now, WqaWI^ < 1 and \\T^9a\\1~^^ = C»(V^) Eo^bca ^'^'/(^ U The contribution of the 

corresponding term to the sum we were trying to bound is 0{Vd) ■ f{A U B)^ • (1 - J)!^!^!^!. 
For each choice of Au B, the (1 — terms sum to at most one, and all the f{A U B)^ 

terms themselves sum to at most one. Therefore, we have bounded the entire sum by 0{\/^) as 
desired. □ 
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5.2 Integrality gap for Unique Label Cover SDP 



Problem and SDP relaxation. In the Unique Label Cover problem, we are given a label set L 
and a weighted multigraph G = {V,E) whose edges are labeled by permutations {vTe: L — )• L}eg_B, 
and are asked to find an assignment f: V ^ L of labels to edges that maximizes the fraction 
of edges e{u,v} that are "consistent" with our labeling, i.e., 'K(.{f{u)) = f{v). If there exists a 
labeling that satisfies all the edges, then it is easy to find such a labeling. However, when all we can 
guarantee is that 99% fraction of the edges can be satisfied, it is not known how to find a labeling 
satisfying even 1% of them. At the same time, present techniques cannot show that finding a 
1%-consistent labeling is NP-hard. 

One approach to solving this problem is to use an extension of the Goemans- Williamson SDP 
for Max-Cut, where we set up a vector Vi for every vertex v and label i: 



(The expectation in the objective is over a distribution where e{u, v} is picked with probability 
proportional to its weight.) The intent is that should be the probability that v receives label 

i, and {ui,Vj) should be the corresponding joint probability. It is easy to see that this SDP is a 
relaxation of the original problem. 

Gap instance. In an influential paper, Khot and Vishnoi [KV05] constructed an integrality gap 
for this SDP: for a label set of size 2^^ and an arbitrary parameter E [0, |], a graph whose optimal 
labeling satisfies < 1/2^^ fraction of the edges, but for which the SDP optimum is at least 1 — rj. 
The hypercontractive inequality plays a central role in the soundness analysis, which we present 
below. 

Let y be the set of all functions / : {—1,1}'^ —)■ {—1, 1} and L be the Fourier basis {x5 | S Q [k]}; 
clearly, \L\ = 2^^. Observe that V is an Abelian group under pointwise multiplication, and L is a 
subgroup. We take the quotient V = V/L to be the vertex set. Fix an arbitrary representative for 
each coset and write 1^ = {/iL, /2L, . . . , /|y|L}. We shall define a weighted edge between every pair 
of these representative functions, then show how to extend this definition to all pairs of functions, 
and finally map these edges to edges between cosets. 

• The edge e{f,g} has weight equal to PT^h,h'[{f, 9) = {h,h')], where h,h' £V are drawn to be 
/9-correlated on every bit with uniform marginals, where p = 1 — 2r]. 

• With every edge e{fi, fj} between representative functions, we associate the identity permu- 



• A non-representative function acts as if its label is assigned according to its coset 's represen- 
tative. Thus, the permutation associated with e{fiXs, fjXr} is XuXs ^ XuXt- 

• In the actual graph under consideration, every edge e{fiXs, fjXr} appears as an edge e{fiL, fjL} 
(with the same permutation and weight). 



maximize Ee{„,„} li^ii (i) ) 
subject to {ui,Vj) > 



Vu, V G V,\/i,j € L 
Vn, V £ L 



{vi,Vj) = 



tat ion. 
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Soundness analysis. Given a labeling R : V ^ L on the cosets, we consider the induced labeling 
R: V ^ L given by R{fiXs) = R{fiL)xs- From our definitions, it is clear that the objective value 
attained by R is precisely Prh,h'[R{h) = R{h')], where h,h' are chosen as before. Fix any label xs 
and consider the indicator function cj): F — )• {0, 1} of functions that R labels with xs- Since exactly 
one function in each coset gets labeled xS: we know that Kcj) = 1/2*''. Therefore, 

PilRih) = R{h') = xs] = Ejm'Pih')] = {h,T,h) = WT^Ml 

h,h' h,h' 

2 k . 

which we can upper-bound (using hypercontractivity) by ||/i||i+p = 1/2 i+p < 1/2^^ . 
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